纵向形象注册是具有挑战性的,并且由于深学习,尚未受益于主要的性能改善。通过深映像的启发,本文介绍了不同利用的深层架构作为常规,以解决图像登记问题。我们提出了一种称为MIRRBA的特定主题可变形的登记方法,依赖于深的金字塔架构是限制变形场的现有参数模型。 MIRRBA不需要学习数据库,而是仅登记的图像,以便注册一对图像以优化网络参数并提供变形字段并提供变形字段。我们展示了深度架构的正规化力量,并呈现了新的元素,以了解架构在注册的深度学习方法中的作用。因此,要研究网络参数的影响,我们在110个转移乳腺癌全身宠物图像的私有数据集中运行了不同的架构配置,具有大脑,膀胱和转移性病变的手动分割。我们将其与传统的迭代登记方法进行比较和监督基于深度学习的模型。使用检测率和骰子分数评估全局和局部注册准确性,而使用雅加诺的决定因素评估登记现实。此外,我们计算了不同方法以消失的速率缩小消失的病变的能力。 MIRRBA显着改善了监督模型的器官和病变骰子分数。关于消失率,MIRRBA多倍于最佳性能的传统方法SYNCC得分。因此,我们的工作提出了一种替代方法来弥合常规和深度学习的方法之间的性能差距,并展示了深度架构的规律力量。
translated by 谷歌翻译
Physics-Informed Neural Networks (PINNs) have gained much attention in various fields of engineering thanks to their capability of incorporating physical laws into the models. PINNs integrate the physical constraints by minimizing the partial differential equations (PDEs) residuals on a set of collocation points. The distribution of these collocation points appears to have a huge impact on the performance of PINNs and the assessment of the sampling methods for these points is still an active topic. In this paper, we propose a Fixed-Budget Online Adaptive Mesh Learning (FBOAML) method, which decomposes the domain into sub-domains, for training collocation points based on local maxima and local minima of the PDEs residuals. The stopping criterion is based on a data set of reference, which leads to an adaptive number of iterations for each specific problem. The effectiveness of FBOAML is demonstrated in the context of non-parameterized and parameterized problems. The impact of the hyper-parameters in FBOAML is investigated in this work. The comparison with other adaptive sampling methods is also illustrated. The numerical results demonstrate important gains in terms of accuracy of PINNs with FBOAML over the classical PINNs with non-adaptive collocation points. We also apply FBOAML in a complex industrial application involving coupling between mechanical and thermal fields. We show that FBOAML is able to identify the high-gradient location and even give better prediction for some physical fields than the classical PINNs with collocation points taken on a pre-adapted finite element mesh.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译
Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.
translated by 谷歌翻译
Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which might be suboptimal when finetuning on downstream tasks with spatial reasoning. In this work, we propose to pretrain networks for semantic segmentation by predicting the relative location of image parts. We formulate this task as a classification problem where each patch in a query view has to predict its position relatively to another reference view. We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query. Our experiments show that this location-aware (LOCA) self-supervised pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们总结了Pirounet的模型和结果,PirOnet是一种半监督的复发性自动编码器。鉴于少量用定性编舞注释标记的舞蹈序列,Pirounet有条件地以编舞家的风格生成舞蹈序列。
translated by 谷歌翻译
对于适当的统计估计,数据集中的偏差可能非常有害。为了应对这个问题,已经开发了重要的加权方法,以将任何有偏分的分布与其相应的目标无偏分布相匹配。如今,开创性内核平均匹配(KMM)方法仍然被认为是该研究领域的最新技术。但是,该方法的主要缺点之一是大型数据集的计算负担。基于Huang等人的先前作品。 (2007)和De Mathelin等。 (2021),我们得出了一种新颖的重要性加权算法,该算法通过使用神经网络预测实例权重来扩展到大型数据集。我们在多个公共数据集上显示,在各种样本偏见下,我们提出的方法大大减少了大数据集上的计算时间,同时与其他重要的加权方法相比,保持了相似的样本偏差校正性能。所提出的方法似乎是唯一能够在合理时间内使用多达200万个数据的大型数据集进行相关重新加权的方法。
translated by 谷歌翻译
使用人工智能(AI)以意图创建舞蹈编舞仍在早期。有条件生成舞蹈序列的方法在遵循编舞特定的创意意图的能力上仍然有限,通常依靠外部提示或监督学习。同样,完全注释的舞蹈数据集罕见且劳动密集型。为了填补这一空白,并帮助深入学习作为编舞者的有意义的工具,我们提出了“ Pirounet”,这是一种半监督的条件性复发性自动编码器以及舞蹈标签网络应用程序。 Pirounet允许舞蹈专业人士使用自己的主观创意标签注释数据,并根据其美学标准生成新的编舞。得益于提议的半监督方法,PirOnet仅需要标记数据集的一小部分,通常以1%的订单为单位。我们展示了Pirounet的功能,因为它基于“ Laban Time努力”生成原始的编排,这是一个既定的舞蹈概念,描述了动作时间动态的意图。我们通过一系列定性和定量指标广泛评估了Pirounet的舞蹈创作,从而证实了其作为编舞工具的适用性。
translated by 谷歌翻译
物理知识的神经网络(PINNS)由于能力将物理定律纳入模型,在工程的各个领域都引起了很多关注。但是,对机械和热场之间涉及耦合的工业应用中PINN的评估仍然是一个活跃的研究主题。在这项工作中,我们提出了PINNS在非牛顿流体热机械问题上的应用,该问题通常在橡胶日历过程中考虑。我们证明了PINN在处理逆问题和不良问题时的有效性,这些问题是不切实际的,可以通过经典的数值离散方法解决。我们研究了传感器放置的影响以及无监督点对PINNS性能的分布,即从某些部分数据中推断出隐藏的物理领域的问题。我们还研究了PINN从传感器捕获的测量值中识别未知物理参数的能力。在整个工作中,还考虑了嘈杂测量的效果。本文的结果表明,在识别问题中,PINN可以仅使用传感器上的测量结果成功估算未知参数。在未完全定义边界条件的不足问题中,即使传感器的放置和无监督点的分布对PINNS性能产生了很大的影响,我们表明该算法能够从局部测量中推断出隐藏的物理。
translated by 谷歌翻译